11 research outputs found

    Predicting tissue specific cis-regulatory modules in the human genome using pairs of co-occurring motifs

    Get PDF
    <p>Abstract</p> <p>Background</p> <p>Researchers seeking to unlock the genetic basis of human physiology and diseases have been studying gene transcription regulation. The temporal and spatial patterns of gene expression are controlled by mainly non-coding elements known as cis-regulatory modules (CRMs) and epigenetic factors. CRMs modulating related genes share the regulatory signature which consists of transcription factor (TF) binding sites (TFBSs). Identifying such CRMs is a challenging problem due to the prohibitive number of sequence sets that need to be analyzed.</p> <p>Results</p> <p>We formulated the challenge as a supervised classification problem even though experimentally validated CRMs were not required. Our efforts resulted in a software system named CrmMiner. The system mines for CRMs in the vicinity of related genes. CrmMiner requires two sets of sequences: a mixed set and a control set. Sequences in the vicinity of the related genes comprise the mixed set, whereas the control set includes random genomic sequences. CrmMiner assumes that a large percentage of the mixed set is made of background sequences that do not include CRMs. The system identifies pairs of closely located motifs representing vertebrate TFBSs that are enriched in the training mixed set consisting of 50% of the gene loci. In addition, CrmMiner selects a group of the enriched pairs to represent the tissue-specific regulatory signature. The mixed and the control sets are searched for candidate sequences that include any of the selected pairs. Next, an optimal Bayesian classifier is used to distinguish candidates found in the mixed set from their control counterparts. Our study proposes 62 tissue-specific regulatory signatures and putative CRMs for different human tissues and cell types. These signatures consist of assortments of ubiquitously expressed TFs and tissue-specific TFs. Under controlled settings, CrmMiner identified known CRMs in noisy sets up to 1:25 signal-to-noise ratio. CrmMiner was 21-75% more precise than a related CRM predictor. The sensitivity of the system to locate known human heart enhancers reached up to 83%. CrmMiner precision reached 82% while mining for CRMs specific to the human CD4<sup>+ </sup>T cells. On several data sets, the system achieved 99% specificity.</p> <p>Conclusion</p> <p>These results suggest that CrmMiner predictions are accurate and likely to be tissue-specific CRMs. We expect that the predicted tissue-specific CRMs and the regulatory signatures broaden our knowledge of gene transcription regulation.</p

    JavaDD: a Declarative Debugger for Java

    No full text
    This paper presents a declarative approach to the debugging of object-oriented programs and illustrates the methodology through an extension of a novel interactive visualization system for Java developed in our previous research. Unlike traditional “procedural ” debugging, we use the term “declarative debugging ” to refer to a flexible set of queries on individual execution states and also over the entire history of execution (or portion of the history). Examples include queries to find all values assigned to a variable over its life-time; which variable has a certain value; the calling sequence that results in a certain outcome; whether a certain statement was executed; etc. These queries were arrived at by a systematic study of errors in objectoriented programs in our previous research. Our proposed system, JavaDD, maintains the execution history as a relational database of salient events, such as method call/return, thread start/end, variable assignment, etc. An important property of our approach is that these queries can be posed interactively (at any step of execution), and there is no need to develop a compiler to instrument the source code, as in related research projects. Furthermore, we also sketch a visual interface so that both queries and answers can be composed using inituitive object and sequence diagrams. We believe such an approach is a significant contribution to the art of program debugging. We present the architecture of JavaDD, a detailed catalog of our queries and their translation, and several examples illustrating the approach. We also compare our approach related research efforts in the area of query-based analysis of object-oriented programs

    HebbPlot: an intelligent tool for learning and visualizing chromatin mark signatures

    No full text
    Abstract Background Histone modifications play important roles in gene regulation, heredity, imprinting, and many human diseases. The histone code is complex and consists of more than 100 marks. Therefore, biologists need computational tools to characterize general signatures representing the distributions of tens of chromatin marks around thousands of regions. Results To this end, we developed a software tool, HebbPlot, which utilizes a Hebbian neural network in learning a general chromatin signature from regions with a common function. Hebbian networks can learn the associations between tens of marks and thousands of regions. HebbPlot presents a signature as a digital image, which can be easily interpreted. Moreover, signatures produced by HebbPlot can be compared quantitatively. We validated HebbPlot in six case studies. The results of these case studies are novel or validating results already reported in the literature, indicating the accuracy of HebbPlot. Our results indicate that promoters have a directional chromatin signature; several marks tend to stretch downstream or upstream. H3K4me3 and H3K79me2 have clear directional distributions around active promoters. In addition, the signatures of high- and low-CpG promoters are different; H3K4me3, H3K9ac, and H3K27ac are the most different marks. When we studied the signatures of enhancers active in eight tissues, we observed that these signatures are similar, but not identical. Further, we identified some histone modifications — H3K36me3, H3K79me1, H3K79me2, and H4K8ac — that are associated with coding regions of active genes. Other marks — H4K12ac, H3K14ac, H3K27me3, and H2AK5ac — were found to be weakly associated with coding regions of inactive genes. Conclusions This study resulted in a novel software tool, HebbPlot, for learning and visualizing the chromatin signature of a genetic element. Using HebbPlot, we produced a visual catalog of the signatures of multiple genetic elements in 57 cell types available through the Roadmap Epigenomics Project. Furthermore, we made a progress toward a functional catalog consisting of 22 histone marks. In sum, HebbPlot is applicable to a wide array of studies, facilitating the deciphering of the histone code

    Benchmarking of alignment-free sequence comparison methods

    Get PDF
    Background: Alignment-free (AF) sequence comparison is attracting persistent interest driven by data-intensive applications. Hence, many AF procedures have been proposed in recent years, but a lack of a clearly defined benchmarking consensus hampers their performance assessment. Results: Here, we present a community resource (http://afproject.org) to establish standards for comparing alignment-free approaches across different areas of sequence-based research. We characterize 74 AF methods available in 24 software tools for five research applications, namely, protein sequence classification, gene tree inference, regulatory element detection, genome-based phylogenetic inference, and reconstruction of species trees under horizontal gene transfer and recombination events. Conclusion: The interactive web service allows researchers to explore the performance of alignment-free tools relevant to their data types and analytical goals. It also allows method developers to assess their own algorithms and compare them with current state-of-the-art tools, accelerating the development of new, more accurate AF solutions
    corecore